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^ ABSTRACT 

i^ We derive an estimator of weak gravitational lensing shear from background galaxy 

c/5 images that avoids noise-induced biases through a rigorous Bayesian treatment of the 

I— —I measurement. The Bayesian formalism requires a prior describing the (noiseless) dis- 

I tribution of the target galaxy population over some parameter space; this prior can be 

J> constructed from low-noise images of a subsample of the target population, attainable 

Lj. from long integrations of a fraction of the survey field. We find two ways to combine 

OO this exact treatment of noise with rigorous treatment of the effects of the instrumental 

point-spread function and sampling. The Bayesian model fitting (BMF) method assigns 

a likelihood of the pixel data to galaxy models (e.g. Sersic ellipses), and requires the 

CO unlensed distribution of galaxies over the model parameters as a prior. The Bayesian 

. . Fourier domain (BFD) method compresses galaxies to a small set of weighted moments 

. 5^ calculated after PSF correction in Fourier space. It requires the unlensed distribution of 

/\ galaxy moments as a prior, plus derivatives of this prior under applied shear. BFD is the 

j^ first shear measurement algorithm that is model-free and requires no approximations or 

ad hoc assumptions in correcting for the effects of PSF, noise, or sampling on the galaxy 

images. These algorithms are good candidates for attaining the part-per-thousand shear 

inference required for hemisphere-scale weak gravitational lensing surveys. BMF has 

the drawback that shear biases will occur since galaxies do not fit any finite-parameter 

model, but has the advantage of being robust to missing data or non-stationary noise. 

Both BMF and BFD methods are readily extended to use data from multiple exposures 

and to inference of lensing magnification. 

Subject headings: gravitational lensing: weak — methods: data analysis 



1. Introduction 

Gravitational lensing reveals the mass distribution of the Universe by detecting deflections of 
photons in the gravitational potential generated by the mass. Along most lines of sight, we can best 
measure the gradient of the deflection, characterized by an apparent shear g and magnification of 
background sources]^ The lensing shear is detectable as a coherent alignment induced on nominally 
randomly-oriented resolved background galaxies. Reliable measurement of this shear opens the door 
to a wealth of astrophysical and cosmological information, including the most direct measures of 



the dark components of the Universe. See Hoekstra Sz Jain (2008) and Weinberg et al. (2012) for 



recent reviews of the power of this weak gravitational lensing technique. 

The full power of the weak lensing technique can only be realized, however, if we are able 
to infer the shear from real image data without significant systematic error. This apparently 
straightforward measurement is complicated by several factors: 

• The shear is weak, amounting to ~ 2% change in a galaxy's axis ratio on a typical cosmological 
line of sight. In a full-sky experiment, it is possible to measure shear with statistical errors 
below 1 part in 10'^ of this 2% — systematic errors must be extremely small else they will 
dominate the error budget. 

• The galaxy is viewed through an instrument (possibly including the atmosphere) which con- 
volves the lensed appearance with a point spread function (PSF) that typically induces larger 
coherent shape changes than the gravitational lensing, and can vary with time and with 
position on the sky. This instrumental effect must be known and removed. 

• The received image of the galaxy is pixelized, meaning it has finite sampling. Even if the 
sampling meets the Nyquist criterion so that the image is unambiguous, our shear extraction 
algorithm must handle the sampling and any other signatures of the detector. 

• The unlensed appearance of any individual galaxy is unknown, and galaxies have an infinite 
variety of intrinsic shapes. No finite parameterization can fully describe the unlensed galaxies. 

• The received image includes photon shot noise and additional noise from the detector. The 
shear inference must maintain exceptionally low bias even when targeting galaxies with signal- 
to-noise ratio S/N < 10-15 if we are to extract the bulk of the shear information available 
from typical optical sky images. 



Heymansetal. (2006), Massey et al. (2007), Bridle et al. (2010), and Kitchinget al. (2012) document 



a series of challenges in which the community was invited to infer the shear from simulated sky 
images, as a means of assessing our abilities to measure shear in the face of the above difficulties. 



^ There are several possible parameterizations for the gradient matrix in terms of shear. We will leave this unspec- 
ified to emphasize that our method is valid for any choice of parameterization. 



These publications also summarize the impressive variety of techniques that have been proposed 
for shear inference. A useful parameterization for errors in shear inference is (using a simplified 
scalar notation) that the measured shear g is related to the true shear via 



5 = (1 +m)g + c. 



(1) 



Huterer et al. (2006) calculate that ambitious weak-shear surveys must obtain multiplicative errors 



\m\ < 10~^ to retain their full statistical power. The additive bias c, which can arise when the PSF 
or other element of the analysis chain is not symmetric under 90° rotation, must be kept below 
~ 10^^'^, which is 100-300 x smaller than the typical PSF ellipticity. The literature contains no 
demonstrations of robust shear algorithms yielding |?7i| < 0.01 at S/N ~ 10. 



Bernstein (2010) demonstrates the ability to attain \m\ < 10 at high S/N, overcoming all 



but the last of the problems itemized above. This "FDNT" method has a rigorous formulation for 
noiseless data. It has, however, proven more difficult to derive a shear inference that is rigorously 



correct in the presence of noise. Refregier et al. (2012) derive the lowest-order noise-induced bias 



in galaxy-shape estimates that are produced via maximum-likelihood fitting to parametric galaxy 
models. But it is not clear that this correction yields the necessary accuracy on real data. The 
most common approach to biases induced by noise (or other systematic errors) has been to use 
simulated sky data to infer m and then apply a correction to the real-sky result. The accuracy of 
this approach is of course limited by the extent to which the simulated data reproduces the salient 



characteristics of the real sky. Kacprzak et al. (2012) propose a somewhat more general scheme of 



calibrating shape-measurement biases vs a few parameters of the galaxy and measurement, e.g. the 
S/N level, resolution, Sersic index, etc., and then applying bias corrections on a galaxy-by-galaxy 



basis. But Zuntz et al. (2013) note that this galaxy-by-galaxy bias correction is inaccurate at low 



S/N, and it is better to determine a single overall correction to the shear using prior high-S/N 
information on a random subsample of the source population. 

All these avenues lead us back to the situation of requiring prior empirical high-S/N infor- 
mation on the underlying source galaxy population in order to produce the noise-bias corrections. 
Once we need an empirical prior on galaxy information, we should look to Bayesian techniques to 



produce a rigorously correct shear estimator. The lensfit method of Miller et al. (2007, lensfit) 
produces a Bayesian posterior distribution P{ei\Di) for the ellipticity of galaxy i given its pixel 
data Di , assuming that galaxies take a known functional form and given a a prior distribution of e 
and the other parameters of the presumed galaxy model. But in lensfit the inference of applied 
shear g from an ensemble of these posterior densities for galaxy shapes is done using some ad hoc 



weighting and averaging schemes. Testing of the lensfit codes on simulated data in Miller et al. 
(2013) yields |?tt,| ~ 0.1 at S/N = 10, necessitating an empirical multiplicative correction to shears 



derived from real data. 

In this paper we will derive a rigorous Bayesian treatment of the inference of weak shear, not 
just galaxy shapes, from pixel data. The general approach is outlined in Section 2. Then we examine 
three ways to apply this approach to real data: Section ^ examines Bayesian model-fitting (BMF) 



approaches and extends the lensfit Bayesian method to shear estimation. Section [4] then shows 
how to apply Bayesian techniques without assuming a parametric model for unlensed galaxies, 
yielding an adaptation of the venerable Kaiser et al. ( 1995| KSB) weighted-moment method that 
treats convolution, sampling, and noise rigorously. We consider this Bayesian Fourier-domain 
(BFD) method to be our most promising algorithm for high-accuracy shear inference. In Section Is] 
we explore whether the FDNT method that is successful at high S/N can be embedded in a 
Bayesian framework for accuracy at finite S/N. In Sections M and u\ we compare to some extant 
shear-measurement algorithms and conclude. 



2. Bayesian shear estimate 
2.1. General formulation 



We begin by assuming that we wish to infer the posterior distribution of a constant shear g 
from an image containing data D that can be subdivided into statistically independent subsets 
Di, each covering a single galaxy i G {1, . . . , Ng}. For a prior probability P{g) of the shear, the 
standard Bayesian formulation for the posterior is 



Pi9\D) 



P{g)P{D\g) 
P(D) 



Pi9)U 






(2) 



Our fundamental assumption is that the posterior In P(g\D) will be well approximated by a 
quadratic Taylor expansion in g, i.e. the posterior will be Gaussian in g. This assumption will fail 
when the number of galaxies is small and the shear is poorly constrained, but should become valid 
when we combine information from a large number of galaxies on a weak shear. In Appendix |A] 
we give a prescription for including 3rd-order terms in g as perturbations, since terms 0{g^) are 
necessary if the estimate of g is to be accurate to < 1 part in 10^^ for g ~ 0.03. For quadratic 
order, we need the 6 scalars that make up the scalar Pi, vector Qj, and symmetric 2x2 matrix Rj 
defined as: 



Pi = P{Di\g = 0) 

Qi=VgP{Di\g)\g^^ 

Ri=VgVgP{Di\g)\g^^ 

P{Di\g)f^Pi+g-Qi + ^g-Iii-g. 



The dependence of the posterior on g can now be expressed as 
- In P{g\D) « (const) - In P{g) -g.Y^9i + ^g 



y I Q^QI 



Pf 



Rj 

Ti 



■9- 



(3) 
(4) 

(5) 



If we presume that the data are much more informative than the prior on g, we may drop the 
\nP{g) and find that the posterior for the shear is Gaussian with covariance matrix C^ and mean 



g defined via 



y (QiQl 






(6) 
(7) 



Note that we have not made any assumption about the Gaussianity of the hkelihood P{Di\g) for 
each galaxy, nor about any priors, etc.; only about the posterior of the applied shear g. 



2.2. Non-constant shear 

Before describing methods of calculating P{Di\g) and its derivatives, we discuss generalization 
of (pi) to non-constant shear fields. Once the quantities Pi,Q.i, and Rj are calculated for each galaxy, 
the posterior for any shear model can be calculated as long as the model predicts shears in the 
regime where Q holds. Consider a model with parameter vector a which predicts shear values 
gi{a) at each galaxy. We can slightly modify our formulation and proceed as before to derive the 
posterior P{a\D) 

-lnP{a|D)«(coiist)-lnF(o)-5]9i(a).2i + 1^9,(a). V- ^C.Of 



E 



Pf 



Rj 



•ffi(«), (8) 



If gf is a linear function of a (and if the prior is approximated as quadratic), then the solution for 
the maximum-posterior a is a closed-form matrix equation. One potentially interesting application 
is a Fourier decomposition g^ = ^ Re(a^e*''^-^'). If the source galaxies are uniformly distributed 
on the plane and we can make the approximation that 

QjQi _ R;J 

p2 p. 



E 



ik-Xi 



for k / 0, 



(9) 



then dependence of the posterior on the Fourier coefficients becomes 

:j:Re 



\xiP(a\D) 



-.E|^"^ 



+ 



-y 



"M • Cg 



• a 



A" 



(10) 



With C~ as from ml. This posterior separates into a Gaussian over each 2-component a^ with 
identical covariance 2C~ on each of the real and imaginary parts of each Fourier coefficient and 
a simple 2x2 matrix solution for the most probable a^. A similarly simple posterior could be 
derived for any decomposition of the shear field into orthogonal functions. 

Bayesian estimation of A^-point correlations of a shear field should also be similarly straight- 
forward, at least in the limit where galaxy shape noise and measurement noise are dominant over 
the sample variance of the shear field. We leave this derivation for future work. 
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3. P{Di\g) via model fitting 

3.1. General Formulation 

If galaxy i is assumed to be fully described by a model with a finite number of parameters, all 
the instrumental signatures are known, and there is a known noise model for the pixel data, then 
one can calculate the likelihood of the full pixel data vector C{Di\e°,Xi,6i). Here we divide the 
galaxy parameters into three classes: 

• e° is a vector of observed parameters that are altered under the action of lensing shear 
g, i.e. the ellipticity of the galaxy model. We presume that there is an exactly known 
transformation under the action of shear from intrinsic source parameters to the observed 
parameters, e° = e^ (B g. Such is the case, for example, when e is the 2-component ellipticity 
of a galaxy with self-similar elliptical isophotes and g is any of the common representations 
of the shear linear transformation matrix. We leave the form of this transformation free at 
this point to accommodate any convention for the definition of the ellipticity e and the shear 
g. We do, however, require that the transformation be reversible: (e © gi) © {—g) = erl 

• Xi is the center of the galaxy, or more generally any parameters whose prior distribution 
P{xi) can be taken as uniform both before and after the application of shear to the sky. We 
hence will not make Xi an argument of our priors. 

• 6i are other parameters of the model, which we take to be invariant under the action of shear 
on the image. Examples would be the half-light radius, surface brightness, and Sersic index 



of a simple elliptical Sersic-profile galaxy model, e.g. as used in Miller et al. (2007). 

The shear-conditioned probability for single galaxy becomes 

P{D,\g) = I del dx, de,C{Di\e'^, Xi, Oi)P{e'^, e,\g), (11) 

where P{e°, 9i\g) is the prior distribution the galaxy parameters given the shear. The shear enters 
the posterior only through this term. Conservation of probability under shear requires that 



Pie°,0\g) = Poie°e-g,9) 



de' 



de° 



(12) 
9 



where Po{e., 0) is the unlensed distribution of galaxy ellipticities, and the last term is the Jacobian 
of the ellipticity transformation e^ = e°© —g. For an isotropic Universe, the unlensed prior should 



^e need not even be 2-diniensional: galaxy models with multiple components of different ellipticities might, for 
example, have 4 or more elements of e. The key is that their transformation under shear must be known. 



be a function only of the amplitude e = |e|, not the orientation. Given the unlensed prior, we can 
make a Taylor expansion in the shear: 



Poie°(B-g,e) 



de" 



de" 



Po(e°, e)+g- Q(e°, 6) + ^g ■ R(e", 0) ■ g + 0{g^). (13) 



Once the transformation e (B g is specified, the functions Q and R can be derived in terms of 
Po{e, 9) and its first two derivatives with respect to e. The quantities needed from each galaxy for 
the Bayesian posterior in Equation ([5| are 



P^ 

Qi 

R, 



de°dxidei£{Di\e°,Xi,ei) < 



R(e°,6', 



> . 



(14) 



Note that Pi would be the Bayesian evidence for Di in the absence of lensing shear. 
The operative procedure for obtaining the Bayesian shear estimate is: 

1. Determine the unlensed prior PQ{e,0) from a high-S/N imaging sample, and the Taylor 



expansion in Equation ( 13 ) 



2. For each observed galaxy, compute the six distinct integrals in (14) with the likelihood over 
the ellipticity and structural parameters to get Pi,Qi, and Rj. 

3. Sum over galaxies to obtain C~ in Equation mn. 



4. Sum over galaxies to obtain the shear estimate g as in Equation ([7]). The shear posterior has 
this mean (and maximum) and covariance matrix Cg. 



Note that the division by Cg occurs after the summation over galaxies, i.e. we do not generate 
ellipticity or shear estimators on a galaxy- by- galaxy basis. Also note that for galaxies with very 
noisy data such that C{Di\e°, Xi, 6i) is flat and uninformative relative to the unlensed prior, inte- 
gration by parts will demonstrate that Qj and Rj both tend to zero. The low-5/A^ galaxy hence 
has no influence on the shear likelihood. It will therefore be unnecessary to make cuts on galaxy 
size or S/N ratio to obtain a successful measurement. As long as the source selection is made on a 
quantity (such as total flux) that is unaffected by galaxy shape or lensing shear, we avoid selection 
biases. 

The only approximation made in this derivation was that of weak shear, namely that Equa- 



tion ( 13 ) is accurate over the range of g permitted by the data. If the unlensed ellipticity distribution 



is characterized by a scale of variation (Tg, this means we are assuming that g <^ ae. 

Extension to multiple exposures of the same galaxy is trivial: the pixel data Di for galaxy i 
may be the union of A^ distinct exposures' information, Di = {Dn, Di2, . . . , -Djat}. Assuming that 



different exposures have statistically independent errors, we have 

N 

jC{Di\e°, X.,, 9i) = H C{Dij\e°, x,, Oi). (15) 

i=i 

If the exposures are in different filter bands, then the most general formulation is that Cj and Gi 
are the unions of distinct structural parameters Cjj and Oij for each band j, and the prior must 
specify the joint distribution of galaxies' appearances in all observed bands. 

An alternative formulation to the above is to integrate over the source ellipticity e^ instead of 
the lensed ellipticity e°, which puts the derivatives with respect to shear in the likelihood instead 
of the prior: 

P{D,\g)= fde'idxid6^C{Di\eleg,Xi,e,)Po{eie,\g). (16) 

In this case the quantities needed for the shear posterior are 

Pi= f dei dxi d9iC{Di\el 6>i)Po(e|, 6*,) 

Q, = j deidxidOi \VgC{Di\el®g,ei)\g^^ Po{e!,ei) (17) 

Ki= del dXi dOi \VgVgC{Di\el ® g, Oi)\g^Q^ Po(e- , Oi) 



We would expect Equations (14) to be the more computationally efficient approach, since the 



derivatives of the prior with respect to shear can be pre-calculated once, whereas (17) require 
calculating 5 shear derivatives of the likelihood for every target galaxy. Furthermore the isotropy 
and parity symmetries of the unlensed sky simplify the shear derivatives of the prior. 



3.2. Simplified demonstration 

3.2.1. Adopted model 

A simplified model demonstrates the accuracy and feasibility of the Bayesian shear estimates 
in the presence of highly non-Gaussian likelihoods for the ellipticities of individual galaxies. We 
take an absolute minimal data vector Di for each galaxy to be just a measurement e™" of the 
two-dimensional ellipticity, ignoring centroid and any structural parameters. For each galaxy we: 

• Draw a source ellipticity e'^ from an isotropic unlensed distribution limited to e'^ = le'*! < 1 
and defined by 

Po(e^) « e^ [1 - (e^)']'exp [-{e^f /2al,,,,] . (18) 

The additional factor of (1 — (e*)^)^ ensures the prior is well behaved at the boundary. 



Generate a lensed ellipticity e° = e^ ® g for a constant shear g, using the full non-Euclidean 



transformation for ellipticity under shear e.g. as described by Seitz k, Schneider (1997) 



Obtain a measurement e*" by adding a measurement error 5e drawn from a 2d Gaussian with 
variance of am per axis, except that we truncate the Gaussian error distribution by requiring 



The likelihood C{Di\e") = £(e™|e°) is a known asymmetrically truncated Gaussian. The measure- 
ment error is highly asymmetric with non-zero mean, and strongly dependent upon the intrinsic 
ellipticity, characteristics which induce biases in most extant shear-estimation methods. 



3.2.2. Integration algorithm 



The integrals in (14) are evaluated using a grid-based approach adapted from Miller et al. 
(2013). We define a set of sampled points, starting by evaluating the integrand of Pi in (14) at a 
random point in e° . We construct a square grid with initial resolution Ae centered on the initial 
sample. We then iterate this process: 

1. Evaluate the integrand at all grid points that neighbor existing members of the sample. 

2. Determine the maximum integrand value /max among all sampled points. 

3. Discard any sampled point with integrand < tmin/max- 

We iterate this process until no new samples are added on an iteration. 

If the number of surviving above-threshold samples is > A'mirn we halt the process. Otherwise 
we decrease the grid spacing Ae by a factor of \/2 by adding a new grid point at the center of each 
previous grid square. We then repeat the above iteration. The grid is refined until we reach the 
desired number of above-threshold samples or until we reach a minimum resolution. We can now 



calculate the integrals in ( 14 ) for each galaxy by summing over the sampled points e 



ty 



P. = ^(Ae)2p(er|e°,.)^o(e°,). (19) 

and evaluate Qj and Rj by summing over the same samples, reusing the calculated C{D = e'^\e°), 



changing only the last terms as per the integrands in (14). 



3.2.3. Results 

For all tests we fix cxprior = 0.3, g = (0.01,0), and A^min = 50. For each parameter set we 
simulate 1.5 x 10^ galaxies to reduce the shape noise below our desired accuracy. 



10 



Fig.fllshows the relative error |^ — Sfl/lfirl for different values of tmin and am- The shaded region 
is the desired relative shear error of < 1 x 10^'^. We recover g to this desired precision for all values 
of am as long as tmin is below lO""^, even when the shear measurement error is as large as am = 0.5, 
as one might obtain for real galaxies with S/N ~ 4. The number of likelihood evaluations per 
galaxy is between 50 and 200 for all cases tested. 
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Fig. 1. — The accuracy to which we are able to recover the reduced shear g in our toy model is 
shown as a function of probability threshold imin (below the maximum) above which we keep points 
for the integration. The different curves show the accuracy for different ellipticity measurement 
errors am- The shaded region shows the desired level of accuracy on g. 

The toy model illustrates the validity of the weak-shear Bayesian formalism in the face of 
non-Euclidean shear transformations and messy (but known) shape measurement errors. Another 
experiment with the toy model was to eliminate the [1 — (e**)^]^ term from (18), leaving a discontin- 
uous truncation at e^ = 1 in the unlensed ellipticity prior. In this case we found biases of w 0.5% 



in the recovered shear, which are attributable to the failure of the Taylor expansion in (13) when 
the prior has features on scales smaller than \g\. 



3.3. Errors from unrecognized structural parameters 



The primary difficulty of implementation of these Bayesian shear measurement methods will be 
the need to construct high-dimensional priors. We will be tempted to reduce the dimensionality of 
the galaxy parameter space. Under what circumstances can we omit a parameter from our Bayesian 
calculation and still obtain a rigorously correct result? Consider the simple case of a parameter 
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a. which can take discrete values ai, 02, • • • with probabihty |?i,p2; • • • • The posterior contribution 
from a single galaxy is 



P{Di\g) oc I de"de Y,^{Di\e° ,e,aj)P{e" ,e\g,aj)pj. 



(20) 



If we are unaware of this parameter or choose to ignore it, we will have a prior marginalized over a 
and probably assign a likelihood that is also an average over the a cases. Our posterior calculation 
will then yield 



P{D,\g) ex J de° dO I 5^p,£(A|e°,0,a,) j K;p,P(e°, 0|g, a,) J 



(21) 



This can differ from the correct (20) unless either the prior or data likelihood is independent of a. 
In other words: the prior must specify the distribution of all galaxy parameters that the likelihood 
depends upon. 



We illustrate such bias by dividing the galaxies in our toy model into two populations A and 
B with distinct iTprior and a-m- Galaxies are selected at random with an equal probability of being 
from population A or B. An observer ignorant of the existence of Type A and Type B galaxies 
would infer a prior and a measurement-error distribution that are found by averaging over the full 
population as in (21). Fig. p] shows the accuracy of the resultant shear estimate when we choose 
CpriorA = 0.1, (TpriorB = 0.3, (JmA = 0.1, and Vary (JmB- As expected a shear bias (up to 0.5%) 
appears as amB becomes more distinct from amA- 

For this Bayesian model-fitting method, an example of a galaxy parameter that can be ignored 
is color, which does not affect the likelihood function of single-band pixel data as long as the other 
parameters {e,x,6} completely specify the single-band appearance of the galaxy. The accuracy 
of the Bayesian inference may be improved, however, if we can include measured color in the 
data vector and include color dependence in the prior, e.g. by distinguishing late- and early-type 
galaxies. 

In model-fitting, ignoring parameters that do affect the likelihood of pixel values can produce 
bias. An example would be assuming a fixed bulge-to-disk ratio when fitting a population of galaxies 
that has varying ratios. 

Unfortunately there is no known finite parameterization for real galaxies' appearances, and 
hence it is not possible to create a formally correct Bayesian model-fitting shear method for real 
galaxies. The means by which incomplete galaxy models can bias shear measurements are illustrated 



and explained by Voigt fc Bridle (2009), Melchior fc Viola (2012), and Bernstein (2010) 
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Fig. 2. — The multiplicative shear bias induced by the unrecognized presence of two galaxy popu- 
lations A and B is plotted vs the measurement error amB of the B population. The A population 
has measurement error amA = 0.1. The intrinsic ellipticity dispersions for A and B galaxies are 0.1 
and 0.3, respectively. The shear bias from the unrecognized structural parameter vanishes when 
the error distributions match and C{D\e°) becomes independent of galaxy type. 



4. P{Di\g) via data compression 

4.1. General Formulation 

Acknowledging that we cannot construct a model of galaxy structure with which to predict 
all pixel values, we can instead compress the pixel data into a small number of quantities that 
carry most of the information on any shear applied to the source. We will call the compressed 
quantities "moments" since we propose that they be intensity-weighted moments. This method will 



hence resemble the venerable Kaiser et al. ( 1995 ) shear-measurement methodology, but with critical 



changes to eliminate the approximations inherent to KSB and a rigorous Bayesian formulation to 
eliminate noise-induced biases. 

We choose to reduce the pixel data for any galaxy image to a small vector M = {Mi, M2, ■ ■ ■ , Mm} 
of derived quantities which we will select to be sensitive to shear. The M must be chosen such that 
one can propagate the pixel noise model to the moment vector. If Mi are the moments calculated 
from the pixel data for galaxy i, we must be able to assign a likelihood C{Mi\M) of producing 
the measured moments given that the galaxy has true (noiseless) moments M. In this case the 
contribution to the shear posterior from galaxy i is 



P{Di\g)(x I dMdxC{M,\M)P{M\x,g)= f dM C{Mi\M) ldxP{M\x,g) 



f22) 
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where P{M\x,g) is the prior distribution of moments of a galaxy centered at x given a local shear 
g. The Taylor expansion of the prior can be written as 

/ dx P{M\x, g) = Po{M) + g • Q{M) + ^g • R(M) • g + 0{g^). (23) 

and then the quantities needed from each galaxy for the Bayesian posterior in Equation ([5]) are 



P 

Q^ 

R, 



dM C{Mi\M) < 



Po(M) 
Q{M) 
R(M) 



(24) 



Now the operative procedure for obtaining the Bayesian shear is: 
1. Determine the prior P{M\x^g) from a high-S'/A^ imaging sample, and perform the Taylor 



expansion in Equation (23). The derivatives of this prior under shear must be obtained by 



simulating the action of shear on each member of the high-S'/iV ensemble defining the prior. 
There is no longer any explicit ellipticity parameter describing each galaxy to represent the 
shear-dependent aspects of that galaxy. Appendix [B] shows how to calculate these derivatives 
for the set of moments we will choose below. 

2. For each observed galaxy, measure the moments Mi of that galaxy about some pre-selected 
coordinate origin and determine the likelihood function C{Mi\M). We have no further need 
of the pixel data for the galaxy after this compression. 



3. Compute the six distinct integrals in (24) over the moment space M to get Pi,Qi, and Rj. 



This is the computationally intensive step. 

4. Sum over galaxies to obtain C~^ in Equation Am. 

5. Sum over galaxies to obtain the shear estimate g as in Equation ph. The shear posterior has 
this mean (and maximum) and covariance matrix Cg. 



4.2. A specific choice of data compression 

To make things more concrete, we propose that the compressed quantities M be intensity- 
weighted moments of the galaxy image. We will evaluate these in Fourier domain where, in the 
absence of aliasing, the exact correction for the effect of the point spread function (PSF) on the ob- 
served galaxy image is a simple division. This yields our Bayesian Fourier-domain (BED) algorithm. 
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The moment vector could be 
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> . 



(25) 



Here I°{k) and T{k) are the Fourier transforms of the observed image and the PSF, respectively, and 
VF(|fc I) is a window function applied to the integral to bound the noise, in particular confining the 
integral to the finite region of k in which T{k) is non-zero, as detailed in Bernstein (2010). There 



is a specific weight that will offer optimal S/N for a given galaxy+PSF pair, but any properly 
bounded weight with finite 2nd derivatives produces a valid shear method. One can select a single 
weight function for a full survey and retain good S/N as long as the PSF does not vary widely in 
size. In real data the integrals revert to sums over the fc-space values sampled by a discrete Fourier 
transform (DFT) of the pixel data Di. 

The first motivation for this choice is that these moments are linear in the observed pixel 
data and therefore meet the requirement that we be able to construct a moment noise model 
C{Mi\M) from the known pixel noise model. In fact if the noise in the pixels is independent of 
the pixel values, then the covariance matrix Cj of the moments is independent of the value of M, 
and also independent of any other properties of the galaxy. Such is the case for the background- 
limited conditions typical of faint-galaxy imaging. If the noise is stationary, than all galaxies with 
the same PSF will have the same Cj. Even if the pixel noise is not Gaussian, the central limit 
theorem implies that the moments, which are sums over many independent pixels, will tend toward 
a Gaussian distribution. We therefore can take 



-2lnC{Mi\M) = ln|27rC,| + (M^ - M) • C^^ -{Mi-M). 



(26) 



Furthermore if the PSF is (nearly) isotropic, then the azimuthal symmetries of our chosen basis set 
guarantee that Cj will be (nearly) diagonal, with the exception of a covariance between the two 
monopole moments Mj and Mr- 

One departure from common shear-measurement practices is that calculating Mi from pixel 
data for i does not involve any iterative procedures such as centroiding, since iteration usually pro- 
duce non-analytic likelihoods. In Section [5] we look for feasible approximations to the propagation 
of errors into some iterative compressed quantities. 

We are also tempted to reduce the dimensionality of our prior and speed up marginalization by 
working with the normalized ratios Mx/Mj, My/Mj, M^/M^, and Mx/Mr that figure prominently 
in KSB, then dropping M/ and Mr from our data vector. The probability distribution for such 
ratios of Gaussian deviates is known ( Melchior &: Violal|2012 ), but depends on the mean value and 
variance of the denominator, so the moment likelihood function would still depend on Mj and Mr. 
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As per the discussion in Section 3^ we would still need to include these quantities in the prior and 
marginalize over them, foiling our plan to simplify the prior. We choose the straightforward path 
of compressing the pixel data to un- normalized moments. 

What is the motivation for choosing this particular set of linear moments? Any choice of well- 
defined compressed data vector M will yield a valid Bayesian shear estimator under this method, 
but the choices will differ in the precision to which they determine the shear g from a given galaxy 
sample. We want M to include quantities that unambiguously capture most of the impact of shear 
upon the pixel image. The quadrupole moments M_|_ and Mx respond at first order to shear. They 
are also sensitive at first order to the galaxy size and flux, and at second order to translation of the 
galaxy image, so we add the monopole and dipole moments Mj, M^, M^, and My, sensitive to flux, 
size, and translation in first order, to yield less degenerate shear information for each galaxy. The 
inclusion of Mr furthermore would admit generalizing the Bayesian formalism to infer the weak 
lensing magnification ^ along with the 2 shear components g. 



4.3. Generating the Prior 

We defer testing of a full implementation of this method for a later publication, but we do 
outline the necessary steps and possible economies. 

The biggest astrophysical challenge will be to produce the prior, which is a function of the 
six M components — which can be reduced to 5 dimensions because the isotropy of the Universe 
requires the unlensed prior to be invariant under coordinate rotation. At each point in the 5- 
dimensional space, we require 5 shear derivatives of the prior as well as the unsheared prior, so the 
we must construct a 6-dimensional function on a 5-dimensional space, and for each source galaxy 
integrate the posterior over all 6 dimensions of M. Note that this BFD method will be much faster 
than the BMF method even at equal dimensionality, because BFD requires the evaluation of only 
one Gaussian C{Mi\M) at each point of the integration, whereas BMF requires evaluation of the 
full likelihood of the pixel data of a model. 

Since galaxy-evolution theory will not be able in the foreseeable future to provide an a priori 
distribution of galaxy appearances, the prior will always have to be empirical. The prior can be 
established by obtaining high-5'/A^ data on a sample of the sky. Note that the M values depend 
upon the choice of weight function W to suit the observational PSF, and implicitly depend upon the 
filter passband used for the observations, so the prior is best constructed from deep integrations 
using the same instrument as the main survey. The number of high-S/N galaxies necessary to 
adequately define the prior is an important issue for future study. 

The prior can be generated by some kernel density estimation over the empirical moments 



Mfj^ where fi indexes the members of the high-S/N template galaxy set. In (22), however, we see 
that the prior is already being convolved with the likelihood function for a target galaxy. Hence if 
the we are using target galaxies of sufficiently low S/N that C{Mi\M) is broad enough to sample 
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many elements of the template set, it is adequate to express the prior as a sum of delta functions 
for each template galaxy: 

P{M\g)^^6^[M-M,{g)], (27) 

where we need M^^g), the moments for the template galaxy if it were sheared by g. The posterior 
contribution for galaxy i becomes 



PiDi\g)^^C[Mi\M^{g)]. 



(28) 



We perform a Taylor expansion on this shear-conditioned probability to obtain our required prop- 
erties: 

Pi = ^£(M,|M^), 
d 



Q^ = E 



dM 



C{Mi\M) 



M„ 



• VgM^, 



(29) 
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where all moments and derivatives are taken at gi = 0. The determination of W gM fj_ from the data 
for template galaxy fj, is straightforward in Fourier domain, and is given in Appendix B. 



Now we adopt the multivariate Gaussian likelihood (26) for the moments. We also can define 



M^{x,(j)) to be the moments that we would assign to template galaxy /i if it were translated to 
X (relative to the target galaxy's coordinate origin) and rotated by (f). Because the unlensed sky 
is isotropic, we can assume that the true prior contains replicas of template galaxy /i at all x and 
0J^ The parity invariance of the unlensed sky also implies that we can place a mirror image of 
each template galaxy in the prior as well. Appendix [B] shows how to calculate moments for these 
transformed versions of a template galaxy. 

For notational simplicity we will subsume the parity flip into the integration over rotation cj). 



We can limit the potential positions x of the template galaxy centroid to be within the range of pixels assigned 
to our target galaxy i, essentially adopting a prior that we have indeed found a galaxy and its center is within our 



postage stamp. This avoids the problem of divergent position marginalization noted in Miller et al. (20131 
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The Taylor-expanded posterior derived from the template sample now becomes: 
-Pi = ^ / dxd(j)Ci^{x,(l)) 

Q, = ^ I dx d<PQ^ix, 0) [M, - M^{x, 0)] • Cri . VgM^{x, 0) (30) 

Il, = Y, [ dx d<pC^^ix, <P) { [Mi - M^{x, </.)] • Cri • VgVgM^{x, <p) 

+VgM^{x, 0) • Cri . VgM^{x, (/>)} 

£,^(a;,(/<) = expj-^ [M, - M^(x,0)] • Cr^ • [Mi - M^(a;,(/.)]| 

The computational challenge of this Bayesian Fourier-domain shear inference is the high mul- 
tiplicity of this calculation: for every target galaxy z, we collect 6 sums over every template galaxy 
/i, with each term of each sum being an integral of a Gaussian function of the three dimensions 
plus parity flip of {x,(j)). There are obvious efficiencies to be gained in this calculation by pruning 
the template set to those with significant contributions to the sums. Furthermore we recall that 
all science will come from sums over a large number of target galaxies, so we can subsample the 
template set when computing the posterior for individual target galaxies, if we can do so without 
inducing systematic biases on (P,, Qj, Rj). 

A substantial speedup of the Bayesian shear calculation is enabled if we can approximate 
M ^ as linearly dependent on a;, in which case two of the three dimensions of the integrals in 



Equations ( 30 ) reduce to linear algebra. Appendix pi shows how to determine these derivatives for 



the template galaxies. 



We leave the testing of a practical implementation of Equations ( 30 ) and the investigation of 
the required size of the template galaxy sample to further work. 



5. P{Di\g) via null tests 

The BFD method improves on the BMF method by eliminating the approximation that target 
galaxies are described by a low-dimensional model. We paid a price, however, in losing the con- 
venience of having the action of shear be fully described by alteration of just the two components 
of e. This allowed us to derive the lensed prior solely from derivatives with respect to e of the 
unlensed prior Po(e) ^)- In the Section we ask whether null-testing methods can be used to make a 
model- free Bayesian shear inference with the simplicity of a known shear transformation e® g. 

Windowed centroiding procedures assign a center a; to a galaxy by translating the galaxy until 
the windowed first moments are nulled. The galaxy is assigned a centroid that is the inverse of 
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the translation needed to null the moments. In the Fourier Domain Null Test (FDNT) method 
( Bernsteiii||2010 ) , this is extended by shearing as well as translating the galaxy (after correcting for 



seeing) until we null the moments Mx,My,M^, and Mx in Equation (25). The galaxy is assigned 



a shape ej that is the inverse of the shear that produces the null. The moment vector for galaxy 
i is hence a function Mi{E) of the four-dimensional transformation E = {g,x) and we assign 
the galaxy a shape and centroid Ei = {ei,Xi) such that Mi(—Ei) = 0. This approach assures a 
well-determined transformation of the measured shape ej under an applied shear g, and there is no 



need of a galaxy model. Bernstein (2010) demonstrates shear inferences errors of < 1 part in 10 



on low-noise data using FDNT. 

The application of rigorous Bayesian formalism to FDNT is foiled, however, because there is no 
straightforward means of propagating the pixel noise model to a likelihood C{Ei\E) of measuring 
a null at Ei when the underlying galaxy has true null at E. It is possible, however, to approximate 
C{Ei\E) in the case where the measured shape and centroid are close to the true ones. At sufficiently 
high S/N of the target galaxy, the likelihood will be confined to such regions and the Bayesian 
formulation with this approximation will become accurate. 



The measured moments Mi for galaxy i when transformed by E can be written as 

Mi{E) = M^{E) + 5Mi{E), 



(31) 



where iW^ is the true underlying moment and 5Mi is the variation induced by measurement noise. 
Since Mi{E) is a linear function of the pixel data, the likelihood Ci [5Mi{Ey\ is calculable and 
close to Gaussian. 



Our first approximation is to linearize M ^ about the E^ that nulls it: 



M^,{E) = Bf,-{E-E 
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dM, 
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E, 



(32) 
(33) 



Second we assume that 6Mi{E) is invariant in the neighborhood of Ei and that this measurement 
error has a multivariate Gaussian likelihood Ci{5M) defined by a covariance matrix Cj and zero 



mean. Then via Equation (31) the nulling transformation Ei is 



{) = Mi{E{) = T>.-5E,^ + 5Mi 



C{Ei\E^) = Ci{-T)^-5E 



i/i. 



ID, 



(34) 
(35) 



If we again construct the prior from a template set of galaxies indexed by /i, having ellipticities 
e^, and uniformly distributed in position x^ with respect to the target galaxy's position, then the 
contribution to the posterior from target galaxy i is the sum over template galaxies: 



PiEi 



9) oc 
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(fx. 



ID, 



exp 



1 



6ElBlCr^U^6E,, 



(36) 
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The integration over the two spatial dimensions x^ of the Gaussian argument E^ = (e^,a;^) is 
analytic. To complete the implementation of the Bayesian framework we need to find the Taylor 



expansion of Equation (36) with respect to g. The right-hand side depends implicitly on the applied 
shear g because the template galaxy's shape e^ is transformed by applied shear and enters into 
6Ei^. To be thorough we should also account for the variation of the derivative vector D^ with 
applied shear. Doing so is straightforward but not instructive so we omit the algebra here. 

We find therefore that there is a high-5/A^ approximation to the Bayesian shear estimator in 
the case of iterative null tests, but that this requires knowing the prior distribution not only of 
the galaxies' ellipticities e but also of the derivatives D of the null tests with respect to shear and 
translation. Hence even in the high-S/N approximation, we have not found a way to produce a 
simpler model- free derivatives of the prior than in BFD. We will therefore choose to pursue BFD 
first, as it demands no approximations beyond the weak-shear Taylor expansion of the posterior 
and does not require searching for nulled moments. 



6. Comparison to other methods 

6.1. Model fitting 

The BMF shear measurement that we propose bears very close resemblance to LENSFIT. lens- 
fit is described as "Bayesian galaxy shape measurement" and differs from our BMF method which 
is derived as a Bayesian shear determination. The consequence is that BMF does not assign shapes 
(ellipticities) to galaxies, instead computing the likelihood- weighted derivatives of the prior for each 
galaxy as per (14), and combining to yield shear as per ^ and ([T]). 



LENSFIT already does the hard computational work of our BMF algorithm, namely the in- 
tegration of the data likelihood times the prior over the parameters of the model space. Hence 
LENSFIT serves as an example of the feasibility of our BMF method. At a minimum level of re- 
alism, the galaxy model must include two ellipticity components e, two centroid components x, 
and 6 including a galaxy flux, size, and a measure of concentration such as the Sersic index, for 7 
parameters. Isotropy reduces the unlensed prior to 6 dimensions. The posterior requires integra- 



tion over 7 dimensions. Miller et al. (2007) reduce the dimensionality of the likelihood integral by 
analytic marginalization over flux (which requires a particular choice of prior) and linearization of 
the model dependence on x. Miller et al. ( |2013 ) use a bulge/disk ratio in place of a Sersic index for 



concentration. Their work hence demonstrates the feasibility of computing the Bayesian integrals 
over a 7-dimensional model space. The lensfit implementations simplify the prior substantially 
by separating the dependence on variables. A numerical approach to constructing an accurate 
fully-coupled prior from high-5/A^ observations remains to be demonstrated. 

There are reasons to suspect that this minimal model does not describe the true galaxy popu- 
lation sufficiently well to achieve \m\ < 10~^, in particular because galaxies with radial gradients in 
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ellipticity induce biases in shear inferences when fit by models without gradients ( Bernstein||2010 ) 



The computational difficulty will grow, with an exponential increase in the number of likelihood 
evaluations, as additional parameters are needed in the model and hence in the prior and the in- 
tegrations This will lead us to favor the BFD method. Model-fitting will remain useful, however, 
in cases of incomplete pixel information on the galaxy, e.g. from cosmic rays, in which case the 
models can serve as a sparse dictionary for the pixel data. 



6.2. Moment compression 

The Bayesian Fourier Domain (BFD) method is model-free, in common with many moment- 



based schemes for shear inference, the best-known of which is from Kaiser et al. (1995, KSB). 
BFD shares with KSB and its brethren the approach that the galaxy pixel data can be compressed 
to a few simple moments which transmit most of the information on lensing shear of the image. 
The BFD method, however, differs fundamentally in explicit use of a prior high-S/N moment 
distribution, as opposed purely to summations over properties of the normal-S'/A^ images. There 
are fundamental differences between BFD and KSB that give the former a rigorous treatment of 
the effects of PSFs and noise: 

• The BFD method accumulates moments in Fourier domain, admitting an exact correction for 
the joint effects of the PSF and shear on the images, avoiding the need for KSB's Gaussian- 
based approximations. 

• The BFD method does not require an iterative centroiding procedure, meaning the full likeli- 
hood of the observed moments remains known even at low S/N. Positional uncertainties are 
treated by Bayesian integration over all possible positions of the galaxy. 

• The "polarizability" that KSB calculates for each galaxy is effectively replaced by integration 
of the target galaxy's moment likelihood over the shear derivatives of the prior. The BFD 
approach is exact in the presence of noise whereas the single-galaxy polarizabilities are biased 
by noise. A consequence of this is the absence of an ellipticity assignment to individual 
galaxies in BFD. 



The non-Bayesian method bearing the closest resemblance to BFD is described by Zhang (2011 ) 
and its predecessor papers. Like BFD, Zhang's method is equivalent to accumulating moments in 
Fourier domain, and emphasizes that a shear inferred from the ratio of two sums over the galaxy 
population — an average "signal" divided by an average "responsivity" — is less biased by noise than 
a shear inferred from a sum of single-galaxy ratios a la KSB. A critical difference is that Zhang's 
method takes moments of the power spectrum |I^(fc)| rather than the Fourier amplitude I{k). This 
makes the method insensitive to the choice of centroid. It also, however, amplifies the noise relative 
to the signal and the rectified noise must be very precisely removed from the power spectrum. 
Neither Zhang nor KSB have an intrinsic means of appropriately weighting the galaxies for the 
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shear information that they contain, which occurs naturally in the BFD method. This means they 
require some form of weighting or selection in order to keep the low-S/N galaxies from ruining the 
S/N of the shear inference. Weighting and selection can themselves induce biases in the inferred 
shear. Finally, Zhang's method is formulated only for a Gaussian weight function P^(|A;|), which 
formally requires integration over regions of /c-space with infinite noise in any real image. 

High shear fidelity on simulated images has been achieved by "stacking" methods, which sum 



galaxy images before analysis (Kuijken 1999 Lewis 2009). These work by essentially creating a 



single high-5/A^ image with an unlensed source that must approach circularity. Noise-induced 
biases are therefore avoided and the number of parameters needed to approximate the mean galaxy 
is greatly reduced. These stacking methods have practical drawbacks when the PSF and/or the 
lensing shear vary across the image. There is also a deeper problem in the need to assign an origin 
to each galaxy to build the stack. At this stage the galaxies' individual low S/N levels are still 
present and noise-induced biases, primarily a suppression of the shear inferred from the stack, are 
important. The BFD method will avoid this problem. 



7. Conclusion 

We have shown how the exact Bayesian formulation of shear inference from a collection of 
galaxy images can be turned into a practical measurement technique in the limit of weak shear, 
by accumulating each target galaxy's contribution to the first terms of the power-law expansion of 
the posterior In P{g\D). We can expect this rigorously derived treatment of noise to yield highly 
accurate shears even at low S/N, and also make it possible to suppress selection biases in the shear 
inference. We confirm the ability of the algorithm to provide part-per-thousand shear inference at 
low S/N, in the case of a highly simplified model of galaxy measurement. 

There are two clear routes to coupling the Bayesian treatment of noise with an exact treatment 
of the effects of the PSF and sampling on the image: Bayesian Model Fitting (BMF) assumes the 
target galaxies follow known parameterized forms, and the pixel data can be compared to models 
that have the instrumental effects applied. BMF needs to know the distribution of the unlensed 
population over the model parameters. The lensfit code has demonstrated the computational 
feasibility of this approach for the minimal galaxy models. Any model- fitting approach is only 
accurate, however, insofar as the real galaxies hew to the models. Given the infinite variety of real 
galaxies, there is an inherent approximation remaining in the BMF method, which needs to be 
evaluated. Model-fitting methods do however have the advantage of working even with incomplete 
or aliased pixel data for a galaxy. 

The second route, Bayesian Fourier Domain (BFD) inference, compresses both the target 
galaxies and the prior into a set of A;-space moments. Exact corrections for PSF and sampling 
are possible and the method is model-free if the images are fully sampled. The likelihood of the 
(compressed) data given the underlying moment vector is a well-defined multivariate Gaussian 
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in the common case of background-limited observations. We propose to apply the method to 
the zeroth, first, and second moments of the galaxies. Measuring these 6 moments will be fast 
and foolproof — no iteration is required. The prior distribution of moments will be 5-dimensional 
because of isotropy. We show how the prior can be constructed and integrated using empirical 
moment measurements on high-S/N images obtained from long integrations on a small subset of 
the sky area in a given survey. This integration is likely to be the most computationally intensive 
step of an implementation of the BFD method. The required size of this template subsample is an 
important issue for future study. 

Both the BMF and BFD methods easily treat the cases of multiple exposures of a single galaxy, 
spatially varying shear and PSFs, and are extensible to the inference of lensing magnification as well 
as shear. While we defer testing of implementations of these methods to a future publication, the 
absence of approximations in the derivation of these methods, particularly the BFD method, gives 
great hope of achieving shear inference at better than part-per-thousand accuracy, as is required 
for future ambitious weak lensing surveys. 
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A. Third-order posterior calculation 



Here we take the expansion for lnP{g\D) to third order in g, and calculate the perturbation to 



the maximum and mean of the posterior relative to the second order (Gaussian) result in Section 2.1 
We adopt a slightly different notation by first expanding the transformed prior as 



P(e°|5) = Po(e°e-ff) 



de' 



de" 



(Al) 



-Po(e) + q{e)g cos A(p 



+ Yko(e) + r2(e)cos2A</.] 

g3 

H [si (e) cos A(j) + S3 (e) cos 3 A0] . 

6 



(A2) 



Both the unlensed prior and the shear transformation formula must be invariant under rotation of 
coordinates, so the sheared prior must be a function only of the ellipticity amplitude e = |e°|, the 
shear amplitude g = \g\, and the angle Ai;^ between them. Sine terms are absent because the shear 
transformation law should be invariant under parity flip. 

The functions q,ro,r2,si, and S3 of galaxy ellipticity e can be expressed as algebraic combi- 
nations of derivatives of Pq and of the shear addition law. These expressions are complex and not 
enlightening, and in practice a numerical estimate is likely to be faster than the algebraic calculation 
anyway, so we omit the algebraic forms. Throughout this Appendix we also omit the additional 
galaxy parameters which will also be arguments of Pq and the five derivative functions. 



Adopting the complex notation g = gx + igy, and designating 
the posterior likelihood of shear for galaxy j becomes 



as the azimuthal angle of e. 



Pig\Di)xP, + Re 
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(A3) 
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After taking the logarithm of the posterior in (A3) to third order in g, we sum over galaxies 
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to get the (log of) the total shear posterior probability: 



lnP{g\D) 
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(A4) 
(A5) 

(A6) 
(A7) 
(A8) 
(A9) 



All sums are over the source galaxy index j. Note that in the absence of applied shear, and with a 
circularly symmetric PSF, there is no preferred direction in the e plane, and the expectation value 
of Q,R2,Si, and 5*3 are all zero. Only Ro, therefore, remains finite in this limit. The other sums 
will be at most 0{g), although we will retain all terms in case the PSF can break the symmetry of 
the unsheared observations. 



The posterior (A4) is simpler if we rotate to the frame where R2 is real and the covariance 



matrix is diagonal. If the original R2 has phase 2(p, then we set 
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As an aside, we note that if the measurement process (including PSF) has no preferred direction, 
then the applied shear g sets the only axis of the system (which is (p), and a consequence will be 
that (Q) = (Si) = {S3) = 0. 



At quadratic order, (A4) implies a Gaussian posterior distribution in g with both maximum 
and mean at 

Ro + \R2y Ro-\R2\. 



ixo,yo) = {-alQx,-alQy) 



(All) 



and independent errors on the components x and y of the shear in this rotated frame, equivalent 
to equations M and ([6|. We treat the cubic terms in (A4) as perturbations to the Gaussian 
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by assuming the 1 5*1^35^ | ^ 1 at any g for which the Gaussian hkehhood is significant. This 
approximation can be restated as 



\SQ^Ro^\ < 1, 



\SR, 



-3/2, 



<!. 



(A12) 
(A13) 



Under these conditions the posterior hkehhood in the rotated g frame can be expanded to first 
order in 81^3 giving 



P{g\D) ex exp 



2al 



exp 



-jy-yo) 

2al 



1 



6 



:Re 



Sl{x + iy) + Slix + iyf 



(A14) 



The cubic term shifts the peak of the Gaussian posterior shghtly, and adds some skew which shifts 
the expectation value further. The expectation value of shear under this posterior can be integrated 
to give 



(x) = Xo 



{y) = yo 



Sic, + S; 



3x 



al + xl 



Rq + I-R2I V ^ 

5'i^ - 353y / xpyp x 

Ro + \R2\ V 3 / 

Six - "iS^x f xoyo \ 



Six — S; 



3x 



<^l + yl 



Rq + I-R2I 



(A15) 



Rq 

Sly 



\R2\ V 3 
S-iy 



) 



Ro — \R2\ 



0,j. + Xq 



Sly — S^y 

Ro — \R2\ 



'<^l + yQ 



In each case, the a^ terms in parentheses arise from the skewness imposed on P{g\D). Omitting 
them gives the location of the maximum of the posterior. 

To obtain some understanding of this result, we take the case when the mean shear is deter- 
mined to high accuracy, so the skewness becomes unimportant. Since we expect yo,Siy, and S^y 
to be smaller than their x counterparts, and also |i?2| ^ Ro, the perturbation from cubic terms is 
expected to be dominated by a shift of the Gaussian likelihood to 



(x) 

(y) 



-Qa 



{Six + Ssx)Q, 



Rq + I-R2I 

~Qy 

Ro — I-R2I 



2{RQ + \R2\f 



(A16) 



The main effect of the cubic term is therefore a slight shift of the shear posterior toward or away 
from the origin. We have not verified whether this abbreviated form is sufficiently accurate in cases 



with anisotropic PSFs, so the full form (A15) should be used 



To summarize, the procedure for estimating the shear using terms to 3rd order in g is: 
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1. From the unlensed ellipticity distribution Pq and the shear transformation law, calculate the 



functions q,ro,r2,si, and S3 of e defined in equation (Al). 



2. For each source galaxy j, integrate over e as per equations (A3) to obtain Pj, Qj, Rqj, R2J, Sij, 
and Ssj. 



3. Sum over source galaxies as per equations (A5)-(A9) to get Q, Rq, R2, Si, and S3. 



4. Rotate by the phase of R2 to diagonalize the covariance matrix of the shear posterior and 
obtain transformed. Phases of the QRS components are changed and the shear component 



variances o"^ and cr^ are obtained as per equations (AlO). 



5. The maximum and mean of the quadratic solution in the rotated system are given by equa- 



tions (All). 



6. The perturbation to the expectation value of the posterior due to cubic terms is in equa- 



tions (A15). 



7. Rotate back to original coordinate system. 



B. Derivatives and transformations of the Fourier-domain moments 

To implement the Bayesian Fourier-domain shear technique of Section |4j we need to calculate 



the first two derivatives of the moment vector in Equation (25) with respect to shear for each 



template galaxy. It is also helpful to know how these moments change under rotation, translation, 
and parity transformations, since the symmetries of the unlensed sky suggest that any template 
galaxy can be replicated with these transformations. 



We start by writing the moments generally as 

M„= I cfki{k)W{\k'^\)Faik). 



(Bl) 



I{k) is the Fourier transform of the pre-seeing galaxy image, and is related to the sky-plane surface 
brightness I{x) by the usual 

i{k)= (fxl{x)ex.p{ik-x). (B2) 



Consider a new galaxy image which is an afRne transformation of the original image, specified by 
a linear amplification A followed by translation by xq: 



I'{x) = I {A-^x - xo) 



Standard Fourier manipulations give 



i'{k) 
k' 



A^fc. 



(B3) 

(B4) 
(B5) 
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The transformed moments are 



M^ = |A| I Ski{k')W{\k'^\)F^{k)e^^'-''^ 



(fki{k)W 



TN-i 



{A^)-k 



Fa 



(A^V'k 



ik-Xo 



(B6) 
(B7) 



B.l. Shear derivatives 

We define a two-component shear g = (51,92) of a galaxy image with the flux-conserving 
transformation 

1 / 1 + 51 92 \ 

Vi-52 V ^2 i-gi )' 



A-1 



(B8) 



It is convenient to adopt a complex notation at this point: 

r,_ d . d 
ogi dg2 



9 = 91+ W2 

With this notation the action of shear k — )• (A ) k becomes 

k ^ k' = {l-gg)-^l^{k + gk) 



H 9 d 

091 092 



(B9) 



(BIO) 



Equation (B7) can now be restated in the complex notation for the case of a shear transformation: 

M'^= f Ski{k)W{k'k')Fa{k') (Bll) 



We are interested in the 2 scalar and 2 complex moments defined as 



Mo = Mi 

Mi = Mx + iMy 

M2 = M+ + iMx 
Mr 

The shear derivative operators can be rewritten as 



Vg = vd + vd 



To2 , r,„-T52 



Fi =ik 
F2 = k^ 
Fr = kk. 



Wo = vv^ d^ + vv' d^ + 2l29a 



1 
1 



(B12) 

(B13) 
(B14) 
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Now the derivatives of the moments with respect to shear are obtained by applying these operators 



to the moment definition (Bll) after substituting in the shear wavevector transformation (BIO) 



For each moment, the derivatives can be expressed as 

VgM„ = / d^k i{k) [W{kk)Aaik) + W'{kk)Ba{k)] (B15) 

VgVgMa = f (fk i{k) [W{kk)Ca{k) + W'{kk)Dc,{k) + W"{kk)Eo,{k)\ . (B16) 

Table [T] summarizes the results of propagating the shear derivatives into our weight and moment 
functions. All of the moments and their derivatives are simple weighted polynomial moments of 
the galaxy Fourier transform. 

B.2. Rotation transformations 

In our adopted complex notation, the effect of rotating the galaxy by angle (j) is to send 



k — )• e^'^k in the argument of F^ in Equation (B7). The monopole moments Mq and M^ are 
unchanged, while the dipole and quadrupole moments Mi and M2 acquire factors e^'^ and e^*''^, 
respectively. The rotational behavior of all the shear derivatives of the moments can also be easily 
assessed by applying the phase factors to all powers of k and k in the elements of Table [ij 



B.3. Parity transformations 

A parity flip sends k ^ k for all of the integrands of the moments and their derivatives in 
Table [T] The moments Mi and M2 are conjugated, meaning that My and Mx change sign, while 
the other moments are unchanged. 



B.4. Derivatives with translation 

A translation of the galaxy by xq adds a factor e ° to I{k) in the integrand of all the 
moments (and their derivatives). In general the integrations of all the moments and their shear 
derivatives would need to be repeated. 

We may, however, elect to approximate the effect of translation on the moments and their 
shear derivatives by linearizing about a^o = 0. In this case we are interested in ^q^, etc. The 
derivative of any moment or its derivatives with respect to xq can be obtained by adding a factor 
of ikx = i{k + k)/2 to all the functions in Table 111 Similarly the yo derivative adds a factor 
iky = {k — k)/2 to the integrands. 

If we adopt the linearization of the moments with translation, then for each template galaxy 
we need a substantial number of real-valued quantities: (6 moments) x (1 moment + 5 derivatives 
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for Q and R) x (1 value + 2 translation derivatives) for a total of 108 numbers per template. 
The actual number of integrals we need to perform is much less than this since many elements are 
repeated in Table [TJ Recall too that these template moments that define the prior only need to be 
evaluated once for the experiment, and the integrations are expressed as standard linear algebra 
routines. Evaluation of these quantities will not be a significant computational burden. 
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Table 1. Functional forms of the integrands for moments and their derivatives, as defined by 



Equations (B16). The derivatives of the moments under translate in x and y directions are found 
by adding factors of i{k + A:)/2 and {k — k)/2 to the entries, respectively. 



Moment 


Mo 


Ml 


Ma 


Mr 


F^ = 


1 


ik 


A;2 


kk 


Aa = VX 

+VX 






ik 



2kk 



P 
P 


Ba = VX 

+VX 


P 
P 


ikP 
iP 


k^P 


kP 
Pk 


+vv'^x 






ik 



2P 
2P 


Akk 



+vv'^x 
+vv'^x 


Akk 




6iPk 

2iP 




SPk 

4kP 




8PP 
2P 
2P 



E, 



t = I2X 


2p~e 


2iPP 


2PP 


2PP 


+vv'^ X 


¥ 


ikP 


P¥ 


kP 


+vv^x 
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